Overview

Dataset statistics

Number of variables13
Number of observations27323
Missing cells0
Missing cells (%)0.0%
Duplicate rows2
Duplicate rows (%)< 0.1%
Total size in memory2.7 MiB
Average record size in memory104.0 B

Variable types

NUM10
CAT3

Reproduction

Analysis started2020-06-05 07:45:55.800871
Analysis finished2020-06-05 07:47:13.350306
Duration1 minute and 17.55 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 2 (< 0.1%) duplicate rows Duplicates
Date has a high cardinality: 253 distinct values High cardinality
region has a high cardinality: 78 distinct values High cardinality
4046 is highly correlated with Total Volume and 3 other fieldsHigh correlation
Total Volume is highly correlated with 4046 and 3 other fieldsHigh correlation
4225 is highly correlated with Total Volume and 2 other fieldsHigh correlation
Total Bags is highly correlated with Total Volume and 3 other fieldsHigh correlation
Small Bags is highly correlated with Total Volume and 3 other fieldsHigh correlation
Large Bags is highly correlated with Total BagsHigh correlation
Date is uniformly distributed Uniform
4046 has 345 (1.3%) zeros Zeros
4770 has 8496 (31.1%) zeros Zeros
Large Bags has 2952 (10.8%) zeros Zeros
XLarge Bags has 16567 (60.6%) zeros Zeros

Variables

Date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count253
Unique (%)0.9%
Missing0
Missing (%)0.0%
Memory size213.5 KiB
2017-03-05
 
109
2017-02-26
 
109
2019-08-18
 
108
2018-01-21
 
108
2019-07-28
 
108
Other values (248)
26781
ValueCountFrequency (%) 
2017-03-051090.4%
 
2017-02-261090.4%
 
2019-08-181080.4%
 
2018-01-211080.4%
 
2019-07-281080.4%
 
2019-07-211080.4%
 
2019-01-271080.4%
 
2017-03-191080.4%
 
2015-11-011080.4%
 
2015-01-041080.4%
 
Other values (243)2624196.0%
 

Length

Max length10
Median length10
Mean length10
Min length10

AveragePrice
Real number (ℝ≥0)

Distinct count260
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.400632434212934
Minimum0.44
Maximum3.25
Zeros0
Zeros (%)0.0%
Memory size213.5 KiB

Quantile statistics

Minimum0.44
5-th percentile0.85
Q11.11
median1.37
Q31.64
95-th percentile2.09
Maximum3.25
Range2.81
Interquartile range (IQR)0.53

Descriptive statistics

Standard deviation0.3854387199
Coefficient of variation (CV)0.2751890578
Kurtosis0.4001088522
Mean1.400632434
Median Absolute Deviation (MAD)0.26
Skewness0.5980915978
Sum38269.48
Variance0.1485630068
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.153101.1%
 
1.193091.1%
 
1.163031.1%
 
1.142981.1%
 
1.262941.1%
 
1.252901.1%
 
1.182841.0%
 
1.232831.0%
 
1.442821.0%
 
1.132821.0%
 
Other values (250)2438889.3%
 
ValueCountFrequency (%) 
0.441< 0.1%
 
0.461< 0.1%
 
0.481< 0.1%
 
0.492< 0.1%
 
0.51< 0.1%
 
ValueCountFrequency (%) 
3.251< 0.1%
 
3.171< 0.1%
 
3.121< 0.1%
 
3.051< 0.1%
 
3.041< 0.1%
 

Total Volume
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count27296
Unique (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean913546.9352911466
Minimum84.56
Maximum63716144.15
Zeros0
Zeros (%)0.0%
Memory size213.5 KiB

Quantile statistics

Minimum84.56
5-th percentile2928.631
Q113614.12
median119865.41
Q3474720.52
95-th percentile4082635.401
Maximum63716144.15
Range63716059.59
Interquartile range (IQR)461106.4

Descriptive statistics

Standard deviation3702672.272
Coefficient of variation (CV)4.05307284
Kurtosis93.12071134
Mean913546.9353
Median Absolute Deviation (MAD)114173.16
Skewness9.067931485
Sum2.496084291e+10
Variance1.370978195e+13
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
9465.992< 0.1%
 
3529.442< 0.1%
 
19634.242< 0.1%
 
46602.162< 0.1%
 
6022.122< 0.1%
 
165072< 0.1%
 
12809.782< 0.1%
 
7223.462< 0.1%
 
256548.812< 0.1%
 
3713.492< 0.1%
 
Other values (27286)2730399.9%
 
ValueCountFrequency (%) 
84.561< 0.1%
 
253.451< 0.1%
 
331.191< 0.1%
 
336.951< 0.1%
 
338.221< 0.1%
 
ValueCountFrequency (%) 
63716144.151< 0.1%
 
62505646.521< 0.1%
 
62451514.931< 0.1%
 
61034457.11< 0.1%
 
52288697.891< 0.1%
 

4046
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count26361
Unique (%)96.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean295318.73312191194
Minimum0.0
Maximum22743616.17
Zeros345
Zeros (%)1.3%
Memory size213.5 KiB

Quantile statistics

Minimum0
5-th percentile25.605
Q1796.425
median10037.85
Q3113317.895
95-th percentile1244389.435
Maximum22743616.17
Range22743616.17
Interquartile range (IQR)112521.47

Descriptive statistics

Standard deviation1273010.158
Coefficient of variation (CV)4.310631244
Kurtosis87.25838413
Mean295318.7331
Median Absolute Deviation (MAD)10010.49
Skewness8.694985241
Sum8068993745
Variance1.620554862e+12
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03451.3%
 
3190.1%
 
413< 0.1%
 
113< 0.1%
 
1.249< 0.1%
 
68< 0.1%
 
1.257< 0.1%
 
1.36< 0.1%
 
1.216< 0.1%
 
1.276< 0.1%
 
Other values (26351)2689198.4%
 
ValueCountFrequency (%) 
03451.3%
 
113< 0.1%
 
1.131< 0.1%
 
1.193< 0.1%
 
1.21< 0.1%
 
ValueCountFrequency (%) 
22743616.171< 0.1%
 
21620180.91< 0.1%
 
21137400.461< 0.1%
 
19498919.531< 0.1%
 
18933038.041< 0.1%
 

4225
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count26947
Unique (%)98.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean290105.93422427995
Minimum0.0
Maximum20470572.61
Zeros172
Zeros (%)0.6%
Memory size213.5 KiB

Quantile statistics

Minimum0
5-th percentile107.825
Q12922.98
median25688.49
Q3145446.395
95-th percentile1225194.379
Maximum20470572.61
Range20470572.61
Interquartile range (IQR)142523.415

Descriptive statistics

Standard deviation1187227.331
Coefficient of variation (CV)4.092392437
Kurtosis90.56766958
Mean290105.9342
Median Absolute Deviation (MAD)25203.12
Skewness8.868402759
Sum7926564441
Variance1.409508736e+12
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01720.6%
 
1170.1%
 
26< 0.1%
 
1.264< 0.1%
 
2.753< 0.1%
 
177.873< 0.1%
 
5.853< 0.1%
 
10.913< 0.1%
 
215.363< 0.1%
 
5.443< 0.1%
 
Other values (26937)2710699.2%
 
ValueCountFrequency (%) 
01720.6%
 
1170.1%
 
1.264< 0.1%
 
1.282< 0.1%
 
1.33< 0.1%
 
ValueCountFrequency (%) 
20470572.611< 0.1%
 
20445501.031< 0.1%
 
20328161.551< 0.1%
 
19900871.871< 0.1%
 
18956479.741< 0.1%
 

4770
Real number (ℝ≥0)

ZEROS

Distinct count17512
Unique (%)64.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22158.67818907148
Minimum0.0
Maximum2546439.11
Zeros8496
Zeros (%)31.1%
Memory size213.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median192.69
Q35898.3
95-th percentile103556.366
Maximum2546439.11
Range2546439.11
Interquartile range (IQR)5898.3

Descriptive statistics

Standard deviation103132.8556
Coefficient of variation (CV)4.65428735
Kurtosis123.5429418
Mean22158.67819
Median Absolute Deviation (MAD)192.69
Skewness9.77606234
Sum605441564.2
Variance1.06363859e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0849631.1%
 
2.6611< 0.1%
 
49< 0.1%
 
1.578< 0.1%
 
98< 0.1%
 
1.657< 0.1%
 
37< 0.1%
 
1.67< 0.1%
 
27< 0.1%
 
3.327< 0.1%
 
Other values (17502)1875668.6%
 
ValueCountFrequency (%) 
0849631.1%
 
0.831< 0.1%
 
15< 0.1%
 
1.011< 0.1%
 
1.091< 0.1%
 
ValueCountFrequency (%) 
2546439.111< 0.1%
 
1993645.361< 0.1%
 
1896149.51< 0.1%
 
1880231.381< 0.1%
 
1811090.711< 0.1%
 

Total Bags
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count27148
Unique (%)99.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean305873.958118435
Minimum0.0
Maximum23472988.69
Zeros15
Zeros (%)0.1%
Memory size213.5 KiB

Quantile statistics

Minimum0
5-th percentile916.947
Q17703.605
median47750.39
Q3146102.06
95-th percentile1289527.221
Maximum23472988.69
Range23472988.69
Interquartile range (IQR)138398.455

Descriptive statistics

Standard deviation1274850.96
Coefficient of variation (CV)4.16789637
Kurtosis119.753632
Mean305873.9581
Median Absolute Deviation (MAD)44335.67
Skewness10.08876168
Sum8357394158
Variance1.62524497e+12
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0150.1%
 
3005< 0.1%
 
9905< 0.1%
 
5504< 0.1%
 
916.674< 0.1%
 
266.674< 0.1%
 
453.334< 0.1%
 
436.673< 0.1%
 
613.333< 0.1%
 
803.333< 0.1%
 
Other values (27138)2727399.8%
 
ValueCountFrequency (%) 
0150.1%
 
3.091< 0.1%
 
3.111< 0.1%
 
3.191< 0.1%
 
3.331< 0.1%
 
ValueCountFrequency (%) 
23472988.691< 0.1%
 
21625372.671< 0.1%
 
20597427.491< 0.1%
 
20597401.031< 0.1%
 
19733973.91< 0.1%
 

Small Bags
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count26348
Unique (%)96.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean218698.1972565238
Minimum0.0
Maximum15436246.72
Zeros159
Zeros (%)0.6%
Memory size213.5 KiB

Quantile statistics

Minimum0
5-th percentile406.67
Q15283.05
median32231.5
Q3104842.35
95-th percentile1007995.497
Maximum15436246.72
Range15436246.72
Interquartile range (IQR)99559.3

Descriptive statistics

Standard deviation888129.2037
Coefficient of variation (CV)4.060980908
Kurtosis106.4973823
Mean218698.1973
Median Absolute Deviation (MAD)30926.24
Skewness9.576043389
Sum5975490844
Variance7.887734825e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01590.6%
 
203.3311< 0.1%
 
223.3310< 0.1%
 
533.3310< 0.1%
 
196.678< 0.1%
 
103.338< 0.1%
 
263.338< 0.1%
 
326.678< 0.1%
 
3008< 0.1%
 
216.678< 0.1%
 
Other values (26338)2708599.1%
 
ValueCountFrequency (%) 
01590.6%
 
2.521< 0.1%
 
2.571< 0.1%
 
2.731< 0.1%
 
2.791< 0.1%
 
ValueCountFrequency (%) 
15436246.721< 0.1%
 
15264523.331< 0.1%
 
13384586.81< 0.1%
 
13377154.271< 0.1%
 
13110016.211< 0.1%
 

Large Bags
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count23042
Unique (%)84.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82025.37220766387
Minimum0.0
Maximum8378355.78
Zeros2952
Zeros (%)10.8%
Memory size213.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1277.37
median4312.49
Q332684.94
95-th percentile301699.133
Maximum8378355.78
Range8378355.78
Interquartile range (IQR)32407.57

Descriptive statistics

Standard deviation391735.6238
Coefficient of variation (CV)4.775785994
Kurtosis160.8942619
Mean82025.37221
Median Absolute Deviation (MAD)4312.49
Skewness11.30554045
Sum2241179245
Variance1.534567989e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0295210.8%
 
3.332560.9%
 
6.671250.5%
 
10780.3%
 
13.33490.2%
 
4.44400.1%
 
16.67320.1%
 
6.66280.1%
 
20250.1%
 
26.67240.1%
 
Other values (23032)2371486.8%
 
ValueCountFrequency (%) 
0295210.8%
 
0.971< 0.1%
 
1.31< 0.1%
 
1.331< 0.1%
 
1.351< 0.1%
 
ValueCountFrequency (%) 
8378355.781< 0.1%
 
7958753.831< 0.1%
 
7864297.231< 0.1%
 
7806415.691< 0.1%
 
7790540.11< 0.1%
 

XLarge Bags
Real number (ℝ≥0)

ZEROS

Distinct count9115
Unique (%)33.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5150.387570910954
Minimum0.0
Maximum844929.83
Zeros16567
Zeros (%)60.6%
Memory size213.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3450.665
95-th percentile17612.705
Maximum844929.83
Range844929.83
Interquartile range (IQR)450.665

Descriptive statistics

Standard deviation30719.20777
Coefficient of variation (CV)5.964445849
Kurtosis226.5085073
Mean5150.387571
Median Absolute Deviation (MAD)0
Skewness13.1689564
Sum140724039.6
Variance943669725.8
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01656760.6%
 
3.331520.6%
 
6.671000.4%
 
10570.2%
 
13.33420.2%
 
1.11290.1%
 
20290.1%
 
16.67180.1%
 
2.22180.1%
 
5150.1%
 
Other values (9105)1029637.7%
 
ValueCountFrequency (%) 
01656760.6%
 
12< 0.1%
 
1.11290.1%
 
1.261< 0.1%
 
1.33< 0.1%
 
ValueCountFrequency (%) 
844929.831< 0.1%
 
751144.11< 0.1%
 
745488.941< 0.1%
 
717175.841< 0.1%
 
716104.21< 0.1%
 

type
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size213.5 KiB
conventional
13662
organic
13661
ValueCountFrequency (%) 
conventional1366250.0%
 
organic1366150.0%
 

Length

Max length12
Median length12
Mean length9.500091498
Min length7

year
Real number (ℝ≥0)

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.956593346265
Minimum2015
Maximum2019
Zeros0
Zeros (%)0.0%
Memory size213.5 KiB

Quantile statistics

Minimum2015
5-th percentile2015
Q12016
median2017
Q32018
95-th percentile2019
Maximum2019
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.406538837
Coefficient of variation (CV)0.000697357019
Kurtosis-1.282566679
Mean2016.956593
Median Absolute Deviation (MAD)1
Skewness0.04300041133
Sum55109305
Variance1.978351501
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2017561620.6%
 
2016561620.6%
 
2015561520.6%
 
2018529219.4%
 
2019518419.0%
 
ValueCountFrequency (%) 
2015561520.6%
 
2016561620.6%
 
2017561620.6%
 
2018529219.4%
 
2019518419.0%
 
ValueCountFrequency (%) 
2019518419.0%
 
2018529219.4%
 
2017561620.6%
 
2016561620.6%
 
2015561520.6%
 

region
Categorical

HIGH CARDINALITY

Distinct count78
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size213.5 KiB
Chicago
 
506
Boston
 
506
Spokane
 
506
Denver
 
506
Seattle
 
506
Other values (73)
24793
ValueCountFrequency (%) 
Chicago5061.9%
 
Boston5061.9%
 
Spokane5061.9%
 
Denver5061.9%
 
Seattle5061.9%
 
Nashville5061.9%
 
Houston5061.9%
 
Orlando5061.9%
 
Plains5061.9%
 
Albany5061.9%
 
Other values (68)2226381.5%
 

Length

Max length20
Median length9
Mean length10.81103832
Min length4

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

DateAveragePriceTotal Volume404642254770Total BagsSmall BagsLarge BagsXLarge Bagstypeyearregion
02015-01-041.2240873.282819.5028287.4249.909716.469186.93529.530.0conventional2015Albany
12015-01-111.2441195.081002.8531640.34127.128424.778036.04388.730.0conventional2015Albany
22015-01-181.1744511.28914.1431540.32135.7711921.0511651.09269.960.0conventional2015Albany
32015-01-251.0645147.50941.3833196.16164.1410845.8210103.35742.470.0conventional2015Albany
42015-02-010.9970873.601353.9060017.20179.329323.189170.82152.360.0conventional2015Albany
52015-02-080.9951253.971357.3739111.81163.2510621.5410113.10508.440.0conventional2015Albany
62015-02-151.0641567.62986.6630045.51222.4210313.039979.87333.160.0conventional2015Albany
72015-02-221.0745675.051088.3835056.13151.009379.549000.16379.380.0conventional2015Albany
82015-03-010.9955595.74629.4645633.34181.499151.458986.06165.390.0conventional2015Albany
92015-03-081.0740507.36795.6830370.64159.059181.998827.55354.440.0conventional2015Albany

Last rows

DateAveragePriceTotal Volume404642254770Total BagsSmall BagsLarge BagsXLarge Bagstypeyearregion
273132015-10-182.027664.361523.543491.300.002649.522606.6642.860.0organic2015WestTexNewMexico
273142015-10-252.006447.441235.042895.730.002316.672316.670.000.0organic2015WestTexNewMexico
273152015-11-011.927296.251652.423123.830.002520.002520.000.000.0organic2015WestTexNewMexico
273162015-11-081.987603.072198.143139.2426.372239.322223.3415.980.0organic2015WestTexNewMexico
273172015-11-151.928175.941925.213271.4316.722962.582946.6615.920.0organic2015WestTexNewMexico
273182015-11-221.976249.431733.402873.9230.951611.161590.0021.160.0organic2015WestTexNewMexico
273192015-11-292.084638.101395.022238.0461.71943.33943.330.000.0organic2015WestTexNewMexico
273202015-12-131.807836.652194.492981.0125.972635.182598.4536.730.0organic2015WestTexNewMexico
273212015-12-201.926255.191512.452407.3211.782323.642213.72109.920.0organic2015WestTexNewMexico
273222015-12-271.817155.631478.792629.6414.103033.102855.55177.550.0organic2015WestTexNewMexico

Duplicate rows

Most frequent

DateAveragePriceTotal Volume404642254770Total BagsSmall BagsLarge BagsXLarge Bagstypeyearregioncount
02017-02-261.1639054.833021.2615568.6811.7720453.1220299.52153.600.0organic2017West Tex/New Mexico2
12017-03-051.2324969.302292.424876.6952.8217747.3717114.89632.480.0organic2017West Tex/New Mexico2